An Approach for Deriving Semantically Related Category Hierarchies from Wikipedia Category Graphs
نویسندگان
چکیده
Wikipedia is the largest online encyclopedia known to date. Its rich content and semi-structured nature has made it into a very valuable research tool used for classification, information extraction, and semantic annotation, among others. Many applications can benefit from the presence of a topic hierarchy in Wikipedia. However, what Wikipedia currently offers is a category graph built through hierarchical category links the semantics of which are undefined. Because of this lack of semantics, a sub-category in Wikipedia does not necessarily comply with the concept of a sub-category in a hierarchy. Instead, all it signifies is that there is some sort of relationship between the parent category and its sub-category. As a result, traversing the category links of any given category can often result in surprising results. For example, following the category of “Computing” down its sub-category links, the totally unrelated category of “Theology” appears. In this paper, we introduce a novel algorithm that through measuring the semantic relatedness between any given Wikipedia category and nodes in its sub-graph is capable of extracting a category hierarchy containing only nodes that are relevant to the parent category. The algorithm has been evaluated by comparing its output with a gold standard data set. The experimental setup and results are presented.
منابع مشابه
Building Multi-Modal Relational Graphs for Multimedia Retrieval
The abundance of Web 2.0 social media in various media formats calls for integration that takes into account tags associated with these resources. The authors present a new approach to multi-modal media search, based on novel related-tag graphs, in which a query is a resource in one modality, such as an image, and the results are semantically similar resources in various modalities, for instanc...
متن کاملA Support Tool for Deriving Domain Taxonomies from Wikipedia
Organizing data into category hierarchies (taxonomies) is useful for content discovery, search, exploration and analysis. In industrial settings targeted taxonomies for specific domains are mostly created manually, typically by domain experts, which is time consuming and requires a high level of expertise. This paper presents an algorithm and an implemented interactive system for automatically ...
متن کاملBuilding Multi-Modal Relational Graphs for Multimedia Retrieval
The abundance of Web 2.0 social media in various media formats calls for integration that takes into account tags associated with these resources. The authors present a new approach to multi-modal media search, based on novel related-tag graphs, in which a query is a resource in one modality, such as an image, and the results are semantically similar resources in various modalities, for instanc...
متن کاملMining Relations between Wikipedia Categories
The paper concerns the problem of automatic category system creation for a set of documents connected with references. Presented approach has been evaluated on the Polish Wikipedia, where two graphs: the Wikipedia category graph and article graph has been analyzed. The linkages between Wikipedia articles has been used to create a new category graph with weighted edges. We compare the created ca...
متن کاملA Category-Driven Approach to Deriving Domain Specific Subset of Wikipedia
While many researchers attempt to build up different kinds of ontologies by means of Wikipedia, the possibility of deriving highquality domain specific subset of Wikipedia using its own category structure still remains undervalued. We prove the necessity of such processing in this paper and also propose an appropriate technique. As a result, the size of knowledge base for our text processing fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013